AITopics | zero-sum stochastic game

7f7fa581cc8a1970a4332920cdf87395-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 05:44:49 GMT

convergence, equilibria, stochastic game, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
(5 more...)

Genre: Research Report (0.67)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)

Add feedback

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Neural Information Processing SystemsDec-27-2025, 04:13:37 GMT

In this work, we study two-player zero-sum stochastic games and develop a variant of the smoothed best-response learning dynamics that combines independent learning dynamics for matrix games with the minimax value iteration for stochastic games. The resulting learning dynamics are payoff-based, convergent, rational, and symmetric between the two players.

finite-sample analysis, independent learning, payoff-based independent learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

7f7fa581cc8a1970a4332920cdf87395-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 11:12:40 GMT

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
(6 more...)

Genre: Research Report (0.67)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)

Add feedback

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Neural Information Processing SystemsJan-20-2025, 02:16:42 GMT

In this work, we study two-player zero-sum stochastic games and develop a variant of the smoothed best-response learning dynamics that combines independent learning dynamics for matrix games with the minimax value iteration for stochastic games. The resulting learning dynamics are payoff-based, convergent, rational, and symmetric between the two players. To establish the results, we develop a coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.

finite-sample analysis, payoff-based independent learning, zero-sum stochastic game, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

Chen, Zaiwei, Zhang, Kaiqing, Mazumdar, Eric, Ozdaglar, Asuman, Wierman, Adam

arXiv.org Artificial IntelligenceDec-8-2023

We consider two-player zero-sum stochastic games and propose a two-timescale $Q$-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players. In two-timescale $Q$-learning, the fast-timescale iterates are updated in spirit to the stochastic gradient descent and the slow-timescale iterates (which we use to compute the policies) are updated by taking a convex combination between its previous iterate and the latest fast-timescale iterate. Introducing the slow timescale as well as its update equation marks as our main algorithmic novelty. In the special case of linear function approximation, we establish, to the best of our knowledge, the first last-iterate finite-sample bound for payoff-based independent learning dynamics of these types. The result implies a polynomial sample complexity to find a Nash equilibrium in such stochastic games. To establish the results, we model our proposed algorithm as a two-timescale stochastic approximation and derive the finite-sample bound through a Lyapunov-based approach. The key novelty lies in constructing a valid Lyapunov function to capture the evolution of the slow-timescale iterates. Specifically, through a change of variable, we show that the update equation of the slow-timescale iterates resembles the classical smoothed best-response dynamics, where the regularized Nash gap serves as a valid Lyapunov function. This insight enables us to construct a valid Lyapunov function via a generalized variant of the Moreau envelope of the regularized Nash gap. The construction of our Lyapunov function might be of broad independent interest in studying the behavior of stochastic approximation algorithms.

function approximation, two-timescale q-learning, zero-sum stochastic game

arXiv.org Artificial Intelligence

2312.04905

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Add feedback

Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations

Wang, Xingyu, Klabjan, Diego

arXiv.org Machine LearningJan-6-2018

This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be not optimal. Compared to previous works that decouple agents in the game by assuming optimality in expert strategies, we introduce a new objective function that directly pits experts against Nash Equilibrium strategies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. In our setting the model and algorithm do not decouple by agent. In order to find Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In our numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to previous approaches using tabular representations. Moreover, with sub-optimal expert demonstrations our algorithms recover both reward functions and strategies with good quality.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1801.02124

Country:

North America > United States > Illinois > Cook County > Evanston (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

The Steering Approach for Multi-Criteria Reinforcement Learning

Mannor, Shie, Shimkin, Nahum

Neural Information Processing SystemsDec-31-2002

We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.

algorithm, reward vector, vector, (14 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Steering Approach for Multi-Criteria Reinforcement Learning

Mannor, Shie, Shimkin, Nahum

Neural Information Processing SystemsDec-31-2002

We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.

algorithm, reward vector, vector, (14 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Steering Approach for Multi-Criteria Reinforcement Learning

Mannor, Shie, Shimkin, Nahum

Neural Information Processing SystemsDec-31-2002

We consider the problem of learning to attain multiple goals in a dynamic environment, whichis initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, inan appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficientconditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

zero-sum stochastic game

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

7f7fa581cc8a1970a4332920cdf87395-Paper-Conference.pdf

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

7f7fa581cc8a1970a4332920cdf87395-Paper-Conference.pdf

A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations

The Steering Approach for Multi-Criteria Reinforcement Learning

The Steering Approach for Multi-Criteria Reinforcement Learning

The Steering Approach for Multi-Criteria Reinforcement Learning